Algorithm-Based Fault Tolerance in Linear Algebra Tasks

نویسنده

  • O. Maslennikow
چکیده

The modification of weighted checksum method is proposed, which allows to derive the fault tolerant versions of most linear algebra algorithms. The purpose is detection and correction of calculation errors occurred due to transient hardware faults. Using the proposed method, the fault-tolerant version of Faddeeva algorithm is designed in this paper. The computational complexity of new algorithm is increased approximately on O(N) multiply-add operations in comparison with the original one. However, new algorithm enables to detect and to correct a single error in an arbitrary row or column of input data matrices at the each algorithm step. Finally, the results of experimental verification of the proposed algorithm are represented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

Adaptive Algorithm-based Fault Tolerance for Parallel Computations in Linear Systems

This paper presents a dynamically adaptive stabilization scheme for parallel matrix computation. The scheme performs automatic error detection and correction through inserting redundant, but concurrent tracer computations within the folds of the regular computation. It also eliminates thecostly rowinterchangeused in classical pivoting. A fault-tolerant double wavefront matrix algorithm for a MI...

متن کامل

On-line soft error correction in matrix-matrix multiplication

Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Soft errors normally do not interrupt the execution of the affected program, but the affected computation results cannot be trusted any more. A well known technique to correct soft errors in matrix–matrix multiplication is algorithm-based fault tolerance (ABFT). While ABFT achieves mu...

متن کامل

Algorithmic Techniques for Fault Detection for Sparse Linear Algebra

The growing complexity and variability of future computing systems is making it increasingly likely that individual circuits will produce erroneous results, especially when operated in a low energy modes. Previous techniques for Algorithm Based Fault Tolerance (ABFT) [7] have been proposed for detecting errors in dense linear operations, but have high overhead in the context of sparse problems....

متن کامل

Soft Error Resilient QR Factorization for Hybrid System

As the general purpose graphics processing units (GPGPU) are increasingly deployed for scientific computing for its raw performance advantages compared to CPUs, the fault tolerance issue has started to become more of a concern than before when they were exclusively used for graphics applications. The pairing of GPUs with CPUs to form a hybrid computing systems for better flexibility and perform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005